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Abstract 

An important classical result in Information Theory states that the Gaussian noise is the worst- 
case additive noise in point-to-point channels. In this paper, we significantly generalize this result 
and show that the Gaussian noise is also the worst-case additive noise in general wireless networks 
with additive noises that are independent from the transmit signals. More specifically, we prove that, 
given a coding scheme for an AWGN network, one can build a coding scheme that achieves the same 
rates on an additive noise wireless network with the same topology, where the noise terms may have 
any distribution with same mean and variance as in the AWGN network. 



1 Introduction 

The modeling of background noise in point-to-point wireless channels as an additive Gaussian noise is 
well supported from both theoretical and practical viewpoints. In practice, we have witnessed that cur- 
rent wireless systems that were designed based on the assumption of additive Gaussian noise perform 
quite well. This is intuitively explained by the fact that, from the Central Limit Theorem, the composite 
effect of many (almost) independent noise sources (e.g., thermal noise, shot noise, etc.) should approach 
a Gaussian distribution. From a theoretical point of view, Gaussian noise has been proven to be the 
worst-case noise for additive noise channels. This follows mainly from the fact that the Gaussian distri- 
bution maximizes the entropy subject to a variance constraint. More precisely, from the Channel Coding 
Theorem [1], the capacity of a channel f{y\x) is given by 

C= max I(X;Y). (1) 

f{x):E[X^]<P 

Thus, if we choose X to be distributed as M{0, P), we have that 

C > h{X) - h{X\Y) = ^ log (27reP) - h{X\Y). 

In the case of an additive noise (AN) channel Y = X + Z, where E[Z] = and E [Z^] = a^, the fact 
that the Gaussian distribution maximizes the entropy implies that h{X\Y) < | log (^27re-^^^ . We 
conclude that 

1 / P\ 
Can > - log 11-1-^1= Cawgn, 

where Cawgn is the capacity of the AWGN channel, which is achieved by a Gaussian input distribution. 
Moreover, a more operational justification of the fact that Gaussian is the worst-case noise for additive 
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noise channels was provided in [2], where it was shown that random Gaussian codebooks and nearest- 
neighbor decoding achieve the capacity of the corresponding AWGN channel on a non-Gaussian AN 
channel. 

Once we go beyond point-to-point channels, Gaussian noise is only known to be the worst-case 
additive noise in some special wireless networks, such as the Multiple Access Channel, the Degraded 
Broadcast Channel and MIMO channels. In all such cases the capacity has been fully characterized and 
is known to be achievable with Gaussian inputs. Therefore, similar arguments to the one above can 
be used to show that, in these cases, Gaussian noise is indeed the worst-case additive noise. However, 
for more general wireless networks where the capacity is unknown, we lack the tools to make such an 
assertion. The recent constant-gap capacity approximations for the Interference Channel [3] and for 
single-source single-destination relay networks [4-6] can only be used to state that Gaussian noise is 
"approximately" the worst-case additive noise in these cases. Nonetheless, in a leap of faith, most of the 
research concerning such systems and many other wireless networks views the AWGN channel model as 
the standard wireless link model. As a result, it remains a fundamental open question whether Gaussian 
noise is the worst-case additive noise in general wireless networks. 

In this work, we answer this question by showing that the Gaussian noise is in fact the worst-case 
noise for arbitrary wireless networks with additive noises that are independent of the transmit signals. 
We consider wireless networks with unrestricted topologies and general traffic demands. We show 
that any coding scheme that achieves a given set of rates on a network with Gaussian additive noises 
can be used to construct a coding scheme that achieves the same set of rates on a network with same 
topology and traffic demands, but with non-Gaussian additive noises. It is also important to notice that 
our coding scheme construction only depends on the mean and variance of the noise distributions of 
our non-Gaussian network, and is oblivious to their precise statistics. This means that our approach also 
results in a framework to design codes for networks with unknown noise distributions with an asymptotic 
performance guarantee. 

We prove that the Gaussian noise is the worst-case noise in wireless networks based on two main 
results. The first one is that, given a coding scheme with finite reading precision for an AWGN network, 
one can build a coding scheme that achieves the same rates on a non-Gaussian wireless network. A 
coding scheme is said to have finite reading precision if, for any node, its transmit signals only depend 
on its received signals read up to a finite number of digits after the decimal point. This result is proven 
in three main steps. We start by applying a transformation at the transmit signals and received signals 
of all nodes in the network in order to create an "approximately Gaussian" effective network. The 
technique resembles OFDM in that it uses the Discrete Fourier Transform in order to mix together 
multiple uses of the same channel. This mixing causes the additive noise terms from distinct network 
uses to be averaged over time and, by making use of Lindeberg's Central Limit Theorem [7], it can be 
shown that the resulting effective noise is approximately Gaussian in the distribution sense. Thus, we 
create an approximately Gaussian network with dependent noises, since the mixing causes distinct noise 
realizations at the same receiver to be dependent of each other. The second step is a combination of 
an interleaving technique and a random outer code, which allows us to handle this dependence among 
the noise realizations. The interleaving operation creates multiple blocks of network uses inside which 
the additive noises are i.i.d. and almost normally-distributed. However, in order to be able to apply the 
original coding scheme that we have for the AWGN network in each of these blocks, we need to make 
sure that its error probability will not change much when the noise distributions are only approximately 
Gaussian. This can be done since we require the original coding scheme to have finite reading precision. 
For such coding schemes, the sets of noise realizations have a special structure and can be shown to be 
continuity sets. It follows from the portmanteau Theorem that the coding scheme's performance on an 
ahnost-Gaussian network does not deviate much from its performance on an actual Gaussian network. 

The second main result we need is that, for any wireless network, the capacity when we restrict our- 
selves to coding schemes with finite reading precision, and allow the precision to tend to infinity along 
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the sequence of coding schemes, is the same as the unrestricted capacity. To prove this we first show 
that, for any coding scheme with infinite precision, there exists a quantization scheme of the received 
signals which does not increase the error probability of the coding scheme too much. This is done by 
showing that a truncation of the bit expansion of the received signal followed by a random shift performs 
well; thus, there must exist a fixed shift for each node which guarantees the same performance. This 
quantization operation makes the coding scheme have finite reading precision, and the result follows. 

The paper is organized as follows. In section 2, we describe the network model and introduce the 
necessary terminology. We start by focusing on wireless networks with \L\ unicast sessions, which 
makes the proofs simpler and easier to follow. In section 3, we state our main result (Theorem 1) and 
the two main theorems that are needed for it. Theorem 2 states that coding schemes with finite reading 
precision can be used to construct coding schemes for non-Gaussian networks. Theorem 3 states that 
coding schemes with infinite reading precision can be "quantized" yielding coding schemes with finite 
reading precision that perform almost as well. The proof of Theorem 2 is broken into three different 
sections as follows. We first describe the OFDM-like scheme in subsection 3.1. Then, in section 3.2, 
we show that the additive noises obtained from the OFDM-like scheme in fact converge in distribution 
to Gaussian noises. In section 3.3, we describe the interleaving technique and the outer code that are 
used to handle the dependence between the noises after the OFDM-like scheme, and we show how 
the requirement of finite reading precision can be used to show that our coding scheme designed for 
a Gaussian network can be appUed to an ahnost-Gaussian network without much loss in performance. 
The proof of Theorem 3 is in section 3.4. In section 4, we describe how we can modify the arguments 
in the previous sections in order to consider, instead of |L|-unicast wireless networks, wireless networks 
with general traffic demands. We conclude the paper in section 5. 

2 Problem Setup and Definitions 

An \L\-unicast additive noise wireless network {G, L) consists of a directed graph G = {V, E), where 
V is the vertex (or node) set and E (IV xV the edge set, and a set L C y x y of source-destination 
pairs. We assume throughout that all sources and destinations are distinct nodes, although it is possible 
to strengthen our results to more general settings, as done in section 4. All nodes in V which are not 
sources function as relays. We associate a real-valued channel gain /i„ „ with each edge (-u, v) G E. 

Communication in a multiple-unicast wireless network is performed over a block of n discrete time 
steps. At time t = 1, 2, n, each node u G V transmits a real- valued signal which must satisfy 

an average power constraint ^ Y^t=i Xl{t\ < P,y u e V, for some fixed P > 0. The signal received 
by node v at time t is given by 

Yv[t]= Yl hu,vXu[t] + N^[t], (2) 

where T{v) = {u G F : (u, G E}, and the additive noise N,^ is assumed to be i.i.d. over time and 
satisfies E[Ny] = and E [N!^] = < oo. We also assume that the noise terms are independent 
from all transmit signals and from all noise terms at distinct nodes, and that each N^, has an absolutely 
continuous distribution. If all the additive noises in the network are normal J\f{0, a^), then we say the 
network is an AWGN network. 

Definition 1. A coding scheme C with block length n G N and rate tuple R = R\l\) € MI^I for 

an \L\-unicast additive noise wireless network consists of: 

1. An encoding function fi : {1, 2"^'} — >■ for each source si, i = 1, \L\, where each code- 
word fi{wi), Wi G {1, 2"-^*}, satisfies an average power constraint of P. 
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2. Relaying functions r^^ : M* ^ — > M, for t = 1, n, for each relay v & V that is not a source, 



satisfying the average power constraint 

2 



1 



forall{yi,...yt-i) eM.^-\ 
3. A decoding function gi : {1, 2^^*} for each destination dj, i = 1, 

Definition 2. The error probability of a coding scheme C (as defined in Definition 1), is given by 



-Perror(^') — Pr 



'\L\ 

[j{Wi^giiYdAl],...,YdM)} 

1=1 



where the message transmitted by source Si, Wi, is assumed to be chosen uniformly at random from 
{l,...,2"«»},/ori = l,...,|L|. 

Definition 3. A rate tuple R is said to be achievable for an \L\-unicast wireless network (G, L) if there 
exists a sequence of coding schemes Cn with rate tuple R and blocklength n, for which Perror(Cn) 0, 
as n — ^ oo. The sequence of coding schemes Cn, n = 1,2, is then said to achieve rate tuple R. The 
capacity region of an \L\-unicast wireless network is the closure of the set of achievable rate tuples. 

We will first focus on coding schemes that have finite reading precision. Then we will show that 
coding schemes with infinite reading precision can be converted into coding schemes with finite reading 
precision without much loss in performance. 

Definition 4. A coding scheme C is said to have finite reading precision p G N if the transmit signal of 
each (non-source) node v in the network at each time t only depends on 

[yv\i]\p = 2-'' L2^i;[i]J ,fori = 1, ...,t- 1, 

as opposed to the complete binary expansion ofYy[i\. 

Definition 5. Rate tuple R is achievable by coding schemes with finite reading precision if we have a 
sequence of coding schemes Cn, where coding scheme Cn has finite reading precision pn, which achieves 
rate tuple R according to Definition 3. 

Remark: Notice that we allow the precision />„ to vary arbitrarily along the sequence of codes, and it 
may be the case that /9„ ^ oo as n ^ oo. 



3 Main Result 

Our main result is to show that any rates that are achievable on a network where each N,^ is normally- 
distributed for each v & V are achieved on a network where each A^^ has an arbitrary absolutely con- 
tinuous distribution with same mean and variance. More precisely, our main result is the following 
theorem. 

Theorem 1 (Main Result). From a sequence of coding schemes that achieve rate tuple R on an AWGN 
\L\-unicast wireless network {G, L), it is possible to construct a single sequence of coding schemes that 
achieves arbitrarily close to R on the same \ L \ -unicast wireless network (G, L), where, for each relay v, 
the distribution of is replaced with any absolutely continuous distribution satisfying E[Ny] = and 
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E [iV^] = ay. Therefore, if Cawgk '-^ the capacity region of the AWGN \L\-unicast wireless network 
{G, L), and Cnon-AWGN the capacity region of the same wireless network (G, L) where, for each relay 
V, the distribution of Ny is replaced with an absolutely continuous distribution satisfying E[Ny] = 
and E [iV^] = cr^, then 

CaWGN C Cnon-AWGN- 

Theorem 1 is proved in the remainder of this section. We will prove it through the following two 
auxihary results. 

Theorem 2. Suppose a rate tuple R is achievable by coding schemes with finite precision on an AWGN 
wireless network (G, L). Then it is possible to construct a single sequence of coding schemes that 
achieves arbitrarily close to R on the same \L\-unicast additive noise wireless network {G, L) where, 
for each relay v, the distribution ofN^ is replaced with any arbitrary absolutely continuous distribution 
satisfying E[Ny\ = and E [N^] = a^. 

Theorem 3. Suppose we have a sequence of coding schemes Cn achieving a rate tuple R on a wireless 
network (G, L). Then it is possible to construct a sequence of coding schemes C* with finite reading 
precision that also achieves R on the same wireless network N. 

It is clear that by combining Theorems 2 and 3, Theorem 1 will follow. To prove Theorem 2, we 
start by assuming that we have a sequence of coding schemes with finite reading precision designed to 
achieve a rate tuple R on an AWGN network. Then, through a series of steps, we will use this sequence 
of coding schemes to construct another sequence of coding schemes that achieves arbitrarily close to the 
rate tuple R on the corresponding network where the additive noises are not Gaussian. 

A diagram illustrating the proof steps of Theorem 2 is shown in Figure 1 . We start by describing an 
OFDM-like scheme that is applied to all nodes in the network. The main idea is that, by applying an 
Inverse Discrete Fourier Transform (IDFT) to the block of transmit signals of each node, and a Discrete 
Fourier Transform (DFT) to the block of received signals of each node, we create effective additive noise 
terms that are weighted averages of the additive noise realizations during that block. We describe this 
procedure in detail in section 3.1. Then, in section 3.2, we show that this mixture of noises converges 
in distribution to a Gaussian additive noise term. This is done by showing that the weighted average 
of the noise realizations satisfies Lindeberg's Central Limit Theorem Condition [7]. Therefore, the 
OFDM-like scheme effectively produces a network where the noises at each node are dependent across 
time and approximately Gaussian. The dependence across time is undesirable since our original coding 
scheme designed for the AWGN network assumed that the additive noise at each receiver is i.i.d. over 
time. To overcome this problem, in section 3.3, we apply the OFDM-like scheme over multiple blocks, 
and then we interleave the effective network uses from distinct blocks. This effectively creates several 
blocks in which the network behaves as an Approximately AWGN network (with i.i.d. noises). Then 
our original code for the AWGN network can be applied to each approximately AWGN block. The fact 
that this code has finite reading precision guarantees that, when applied to the approximately AWGN 
block, its error probability is close to its error probability on the AWGN network. More formally, this 
means that, for any choice of messages w G nl=i{li 2^^^}, the probability that the joint noise 
realization Z belongs to the error set Aw (i.e., causes an error to occur) is approximately the same in 
both cases, which follows from the fact that can be shown to be a continuity set. Finally, we take 
care of the dependence between the noises of different blocks created in the interleaving operation by 
using a random outer code for each source-destination pair. Then we can show via a mutual-information 
argument that we can achieve a rate tuple arbitrarily close to R on the non-Gaussian wireless network. 

In section 3.4, we prove Theorem 3. The main idea is to show that, given a coding scheme with 
infinite reading precision, there exists a set of quantization mappings, one for each node in the network. 
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such that, if each node quantizes its received signal before applying the relaying or decoding function, 
the change in the error probability is arbitrarily small. 



Non-Gaussian 
i.i.d. Noise 
Networl< 


OFDM-like 
scheme ^ 


Approximately 
Gaussian dependent 
Noise Networl< 


Interleaving ^ 


Approximately AWGN per 
block (dependent noise 
across blocks) Network 


















Outer code 
achieves 



AWGN 
Network 



From Lemma 2, codes with finite reading 
precision perform similarly 




same rate as 



Approximately AWGN 
Network 



Figure 1 : Diagram of proof steps of Theorem 2. 



3.1 An OFDM-like scheme to mix the noises over time 



We use an approach similar to OFDM in order to create an effective network with additive noises that 
are as close to normally-distributed as we wish. Let us assume that a node u £ V has b real-valued 
signals do, ^i, c?6-i which are the inputs to the effective channels we intend to create. We assume 
that b is even, to simplify the expressions. Then node u "packs" these signals into b complex numbers 
do, ...,db-i as follows. 



do = do 

di = d.2i-i + jd2i for i = 1, 



d, 
di 



'6/2 



d 



b-i 



d* 
"6- 



for i 



Next, node u takes IDFT of the vector du = (do, to obtain the vector Xu = IDFT(du). 

Throughout the paper, we assume that DFT and IDFT refer to the unitary version of the DFT and IDFT. 
Since du is conjugate symmetric, Xu is a real vector (in M^). Moreover, we will require the original 
real-valued signals to satisfy 



avg [dl] < P, 

avg [d^i] < P/2, fori = 1, 
avg [dl] < P, 



(3) 
(4) 
(5) 



where the avg operator refers to time average; i.e., if each di is seen as a stream of signals di[k], 
then mg{di) = Ylt=i di[t\- Then we must have, by Parseval's relationship. 



avg 



IX,. 



6-1 

E 

i=0 



avg 



\di\ 



6/2-1 



v I avg [dl] + avg [dl] + 2 ^ avg [dli_^ + d 



< P. 



i=l 



6 



Therefore, u may transmit k vectors Xu, each one over b time-slots, and the output power constraint 
over the block n = kb will be satisfied. A node v will receive, over each sequence of b time-slots, 

By applying a DFT to each block of b received signals, node v will obtain 

Yv=DFT(Yv)= V^du + DFT(Nv). 

This transformation is illustrated in Figure 2. 





Y4 = E •=! hi4di + DFT(N4) 



Figure 2: An example of an effective network after OFDM-like scheme. 

Next, by looking at each component of Yv, we notice that we have effectively b complex- valued 
received signals. The additive noise on the received signal is given by 



^/b■ 
1 



b 



DFT(Nv), = -^^iV.[z]e-^-2- 



^^..,„,jcos^— ^-,i_|:iV.[^]sin(^) 



(6) 



By considering the real and imaginary parts of each component Yv,i of Yv, for i = 0, 6—1, sepa- 
rately, we obtain the following 26 — 2 effective received signals: 



(I) Yv,o = Eueliv) hu,vdu,o + DFT(Nv)o 

(II) K [YvJ = E„6X(.) ^«,.du,2i-i + K [DFT(Nv),] 



(III) 



Yv,i 



Euexiv) Kvdu,2i + ^ [DFT(Nv)^] 



for? = l,...,|-l 



for i = I, 



(IV) Yv,b/2 = Euei(v) hu,vd^,h-i + DFT(Nv)6/2 



(V) K 

(VI) 9 



Yv,i 



^uei{v) ^w.^'du^acb-i)-! + ^ [DFT(Nv)i] for i = 

- T.uex{v) ^n,i'du,2(b-i) + 3= [DFT(Nv)i] for i = 



+ 1,...,6-1 
+ 1,...,6-1 



However, from the conjugate symmetry of DFT(Nv) (since Nv is a real-valued vector), we have that 
3? [DFT(Nv)i] = 3ft [DFT(Nv)6-i] and 9 [DFT(Nv)i] = [DFT(Nv)6-j], for i = 1, 2, 6-1, and 
all the received signals from (V) and (VI) are repetitions (up to a change of sign) of the received signals 
in (II) and (III). Therefore, we conclude that we have effectively 6 distinct real- valued received signals 
with additive noise. It is important to notice that the additive noise terms are dependent across these 6 
received signals. 
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3.2 Noise mixture converges to Gaussian Noise 

In this section, we show that the additive noise terms of the effective received signals we obtained in the 
previous section approximate a Gaussian distribution as b gets large. We will use the following classical 
result. 

Theorem 4 (Lindeberg's Central Limit Theorem). Suppose that for each b = 1,2,..., the random vari- 
ables Ifo,!, l6,2) ■••) Yb,b independent. In addition, suppose that, for all b and i < b, E[Yi,^j] = 0, and 
let 

b 
i=l 

Then, if for all e > 0, Lindeberg 's condition 



1 

-2 ^ ^ (n^^ 1 > esk}) ^ 6 ^ oo (8) 

holds, we have that 



i=i 



Sb 

Lindeberg's CLT can be used to prove the following lemma. 

Lemma 1. Let N[Q], N[i\, N[2], ... be i. i.d. random variables that are zero-mean, have variance and 
have an absolutely continuous distribution, and let 

for some ^g{1,...,6 — 1}\ {b/2}. Then, Z\) converges in distribution to M{0, (t^/2) as b oo. 

Proof We start by letting Yb,i+i = N\i] cos (^), for i = 0, 1, 5-1. Then, by following (7), we 
have 

1=1 i=0 



i=0 i=0 

2 4^V ) 2 A(^Ml\ A(^^-3i^a\ 2 



i=0 



4(l_gj47r«^) 4(1 -e" 

The last equahty follows because e"^'^'^^ = 1 and e-^'^'^^s 7^ 1 for any ^ G {1, 6 - 1} \ {6/2}. Next we 
let Ub,i = Y^- 1 {\Yb^i\ > esb} = Y^- 1 > eo-y^j. Consider any sequence ib, for 6 = 1, 2, 

such that G {1, 6}, and any S > 0. Then we have that 

Pr {Ub,i, <5)>Fv (|n,,J < eaVp) 

> Pr(|Ar[i6- 1]| <ea^b/2j 

= Pr (|A^[1]| < ea^bji^ ^ 1, as 6 ^ 00, 
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which means that C/fe^j^ A as — > oo. Moreover, we have that \ Ub^if^\ = Ub^i^ < N[i}) — 1]^ for all 
h, and E [A^[if, — 1]^] = fx^ < oo. Therefore, by the Dominated Convergence Theorem, we have that 
^Wb,%] — >■ as 6 — )■ OO. We conclude that 

6 1=1 i=l 

2 

< — ^ max E lUh — as 6 — >■ oo, 

(7^ l<j<b 

and Lindeberg's condition (8) is satisfied for any e > 0. Hence, from Theorem 4, we have that 

ay'b/2 v2 ay'b/2 

□ 

Now consider the additive noise term in (II). It is the real part of (6), which, by Lemma 1, converges 
in distribution to J\f{0, as 6 — > oo. Moreover, it is easy to see that Lemma 1 can be restated with 

sines replacing the cosines, and the same result will hold. Thus, the additive noise in (HI) also converges 
in distribution to M{0, cj^/2). Finally, for the received signals in (I) and (IV), it is easy to see that the 
additive noise in (6) only has a real component, and by the usual Central Limit Theorem, it converges in 
distribution to A^(0, a^). 



3.3 Interleaving and Outer Code 

In the previous section, we saw that by choosing the length of the OFDM block b sufficiently large, it is 
possible to make the effective additive noise at each node v arbitrarily close (in the distribution sense) 
to a zero-mean Gaussian noise with variance for (II) and (III) and for (I) and (IV). Notice that, 
since in (4) we restricted the power used in the network uses corresponding to (II) and (III) to P/2, all 
of our effective channels have the same SNR they would have if the transmit signals had power P and 
the noise variance a'^. 

In this section, we address the fact that, as we mentioned before, the additive noise at node v in the 
b effective network uses are dependent of each other. In order to handle this dependence, we consider 
using the network for a total of bk times, performing the OFDM-Uke approach from section 3.1 within 
each block of b time steps. Then, by interleaving the symbols, it is possible to view the result as b blocks 
of k network uses. This idea is illustrated in Figure 3. Notice that, within each block of k network uses, 
the additive noises are independent, but they are dependent among distinct blocks. 

Since from the statement of Theorem 1, the rate tuple R is achievable by coding schemes with finite 
reading precision, we may assume that we have a sequence of coding schemes Ck = {k, R) with finite 
reading precision pk, whose error probabihty when used on the AWGN network is 

~ -ferror(Cfe), 

and satisfies — > as A; — > oo. Now, consider applying this code over one of the b blocks of length k 
that we obtained from the interleaving. Notice that, in order to apply code Ck on a length- /c block other 
than the first or the last one, we will have to divide the output transmit signal of all the nodes by \/2 to 
satisfy (4), but since the additive noises in these blocks have their variance divided by 2 as well, each 
node can re-scale its received signal by multiplying it by -\/2, and the code performs in the exact same 
way. Now, if b is chosen fairly large, over this block of length k, the noises at all nodes are independent 
and i.i.d. over time, and are very close to Gaussian in distribution, and, intuitively, the error probability 
we obtain should be close to e^. We will let ek,b be the error probability obtained when we apply Ck 
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bk 



b 




k 



Figure 3: Interleaving the effective network uses obtained from the OFDM-like scheme. 



to one of the h blocks of length A; (notice that this error probability should be the same for any of these 
blocks). 

We let Zf, G M^l^l be the random vector associated with the effective additive noises at all nodes 
in V during this length-fc block, assuming that we performed the OFDM-like scheme in blocks of size 
h. Since each component of Zf, is independent and they all converge in distribution to a zero-mean 
Gaussian random variable, we have that Zf, converges in distribution to a Gaussian random vector. We 
let Z be this limiting distribution, and we know that the component of Z corresponding to node v and 
time £ is distributed as AA(0, a^) (or AA(0, (t1/2), depending on the length-fe block chosen), for any i G 
{1, k}. Now notice that, if we fix the messages chosen at the sources to be w = (lui, u;2, w\l\) ^ 

nl=i{l, -, 2^=^}, then, whether Ck makes an error is only a deterministic function of Zf,. Therefore, 
for each w G nl=i{l> 2'=^*}, we can define an error set corresponding to all reaUzations of Zf, 
that cause coding scheme Ck to make an error. It is important to notice that is independent of the 
actual joint distribution of the noise terms; it only depends on the coding scheme Ck- Then we can write 

= 2-*^ Si=i [2^' ^ ^w] (10) 

w 

and also 

efe = 2-*^Sl=i^^^Pr[ZG Aw]. (11) 
w 

Our first goal is to show that eb^k efc as 6 ^ oo. Recall that a Borel set A C is said to be 
a ^-continuity set for some probability measure /i on M™, if ii{dA) = 0, where dA is the boundary 
of A. Next, we state the following classical result, which provides an alternative characterization of 
convergence in distribution. 

Theorem 5 (portmanteau [7]). Suppose we have a sequence of random vectors T^i, G M'^I^I and another 
random vector Z G M^I^L Let //f, and n be the probability measures on M'^I^I associated to Zf, and 7i 
respectively. Then Zf, converges in distribution to Z if and only if 

lim iib{A) = ^{A) 

&— >-oo 

for all ^-continuity sets A. 

Then, if we let /x be the probabiUty measure on M^^l^l associated to Z we have the following Lemma. 
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Lemma 2. Suppose we have a coding scheme C = (A;, R) with finite reading precision p. Then, for any 
choice of messages w G nl=i{l) ■■■■> 2^^^}, the error set Ay, is a fx-continuity set. 

Proof. Fix some choice of messages w. We will use the fact that C has finite reading precision p to 

show that our set A^j and its complement A'^ = R'^'I^I \ A^ can be represented as a countable union of 
disjoint convex sets, which will then imply the /x-continuity. Recall from Definition 4 that, in a coding 
scheme with finite reading precision p, a node v only has access to \ Yy\p. Thus, we will call \ Yy\p the 
effective received signal at v. The set 

y = {{yi,-,Vk\V\) G K'^"'' : Vi = \.yi\p,i = 1, -,^1^1} 

can be understood as the set of all possible values of the effective received signals at all nodes in V 
during a length-A; block. It is clear that 3^ is a countable set for any finite p. 

Notice that, for our fixed choice of messages v^^, the vector y G 3^ corresponding to the effective 
received signals at all nodes during the length- A; block is a deterministic function of the value of all 
the noises in the network during the length-Ac block, z G M^^l^l. Therefore, for each y G 3^, we define 
Q(y) C R'^I^I to be the set of noise realizations z that will result in y being the effective received signals. 
We claim that Q{y) is a convex set. To see this, consider two noise realizations z, z' G Q{y) and fix 
some a G [0, 1]. We will show that if we replace one of the components of z with the corresponding 
component of az -|- (1 — 0)2!, the resulting noise reaUzation 2," is still in Q{y). Then, by using the same 
argument with z" instead of z, another component of z" is replaced with a component ctz + (1 — a)z', 
and by repeating this argument, it follows that az + (1 — a)z' is itself in Q{y). So let us focus on the 
component corresponding to node v at time I. Let yv[^]* be the noiseless version of the received signal 
at V at time ^ with its complete binary expansion. Since z and z' result in the same y, we have that 

yv[^] = [yv[^r + ^.MJp = [yvW + • 

Now, if we assume wlog that Zy[^\ < z[,[£], we have 

[yvier + z,[i]\^ < [y.[£r + az,[£] + (1 - a)4Mj^ < [y^l]* + z',[e]\^. 

Thus, it follows that yv[£] = [yv[f]* + az^[i] + (1 - a)z'^[e]\^, and by replacing with azv[£] + 
(1 — a)z'^ [£], we obtain a noise realization z" that is still in Q{y), and the claim follows. 

In Appendix A, we prove that, for any convex set S, X{dS) = 0, where A is the Lebesgue measure. 
Moreover, since our measure p, is absolutely continuous, it follows by definition that 

\{S) = ^ p{S) = 0, 

for any Borel set S. Thus, since \{dQ{y)) = 0, we have that p{dQ{y)) = 0. This, in turn, clearly 
implies that 

p (Q(y)°) = p (qM) = (<5(y)) , (12) 

where we use S° to represent the interior of S and S to represent its closure. Next, let = 
{y G 3^ : n Q{y) / 0}. Notice that all noise reaUzations z G Q{y) will cause all nodes and, in 
particular, the destination nodes to effectively receive the exact same signals. Therefore, it must be the 
case that, if A^, n Q{y) 7^ 0, then Q{y) C A^, which implies that 

U Q(y) = ^w. 
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Moreover, it is obvious that any noise realization must belong to exactly one set Q{y), and we have 



U Qiy) = ^■ 

yey\yA^ 



Finally, we obtain 



= 1- ^(Q(y)) = i- E 

yey\yA^ yey\yA^ 

= i-m( U Q(y)°) >i-/^((^^n 

\yey\yA„ / 

where (i) follows since, for sets Bi, B2, {UiBi)° I) UiB°, (ii) follows from the countability of 
and the fact that (5(yi) n (5(y2) = O for yi 7^ y2, and (iii) follows from (12). We conclude that 

//(Mw) = ^i (^^) - M (^w) = 0. □ 

Now it follows from Theorem 5 and Lemma 2 that, for all message choices w, we will have 

lim Pr [Zb G ^w] = Pr [Z G ^w] , (13) 

fe— >-oo 

which implies that Ch^k — )• €fc as 6 — > 00. 

From the previous discussion, we see that we can apply code Ck within each of the b blocks of length 
k and obtain a probability of error (within that block) that tends to as b 00. However, since we 
have a total of b blocks of length k, we make an error if we make an error in any of the b blocks of length 
k. It turns out that a simple union bound does not work here, since the error probability would be of the 
form beb^k and we would not be able to guarantee that it tends to as 6 and A; go to infinity. Instead we 
consider using an outer code for each source-destination pair. 

The idea is to apply coding scheme Ck to each of the b length-A; blocks, and then view this as creating 
a discrete channel for each source-destination pair. More specifically, for each length-6/c block, source 
Sj chooses a symbol (rather than a message) from {1, 2^^^^}^ and transmits the b corresponding 
codewords from Ck- Then destination dj will apply the decoder from code Ck inside each length-A; block 
and obtain an output symbol also from {1, 2'^^^}^. Notice that, by viewing the input to bk network 
uses as a single input to this discrete channel, we make sure we have a discrete memoryless channel, and 
we can use the Channel Coding Theorem. We can view and as the discrete input and output 
of the channel between Sj and dj. We will then construct a code (whose rate is to be determined) for 
this discrete channel between Sj and dj by picking each entry uniformly at random from {1, 2^^^ 
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Then, source-destination pair {sj,dj) can achieve rate 

1 ^ 

i=l 

(i) 1, 

> R, - -{l + €b,kkRj) 

where {i) follows from Fano's Inequahty, since, within each length- block, we are applying code Ck 
and we have an average error probability of at most jt (it should in fact be much less than e;, ^ since 
we are only considering the error event Wj [i] ^ Wj [i] and e(,^fe refers to the union of these events for all 
source-destination pairs). 

We conclude that, by choosing 6 and k sufficiently large, it is possible for each source-destination 
pair to achieve arbitrarily close to rate Rj. Thus, our coding scheme can achieve arbitrarily close to the 
rate tuple R. This concludes the proof of Theorem 2. 



3.4 Optimality of Coding Schemes with Finite Reading Precision 

In this section, we prove Theorem 3. This theorem implies that, if we restrict ourselves to coding 
schemes with finite reading precision, and allow the reading precision to tend to infinity along the se- 
quence of coding schemes, we can achieve any point in the capacity region of a wireless network, thus 
characterizing the optimality of coding schemes with finite reading precision. We start by consider- 
ing a sequence of coding schemes C„ (with infinite reading precision) that achieves rate tuple R on a 
|L|-unicast wireless network {G, L). Then we will build a sequence of coding schemes C* with finite 
reading precision that also achieves rate tuple R on (G, L). 

Let €n be the error probability of coding scheme C„, which achieves rate tuple R on (G, L). From 
Definition 3, we have that en — )■ as n — oo. For any fixed n, we will first build a sequence of 
coding schemes with finite reading precision m = 1,2, for (G, L), such that code has 
error probability em,n, where em,n — > e„ as m ^ oo. This will then allow us to choose a finite m for 
which Em^n IS arbitrarily close to e^. Notice that, from Definition 1, relaying and decoding functions 
should be deterministic. However, in order to construct coding scheme C*„ „, we will first assume that 
the relaying and decoding functions are allowed to be randomized, and later we will derandomize the 
constructed coding scheme. Recall that, from Definition 1, coding scheme Cn is comprised of encod- 
ing functions {fi ■ I < i <\L\}, relaying functions |ri*^ : v eV,1 <t <n^ and decoding functions 

We will build Cj^^ from C„ by using the same encoding functions /j, z = 1, |L|, 
and replacing the relaying functions with 

fW (^[1], ...,y,[t - 1]) ^ rW ...,yi™)[i - 1]) 

for 1 < t < n and v G F, and replacing the decoding functions with 

(y„[i], ...,YM) = 9r {y^'^\^. -.n^^^N) , 

for 1 < i < |L|, where we define 

yM[t] = [y4t]j^ + C/M[t], (14) 
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for i; G F and 1 < t < n, where Ui,""' [1], U-T' [n] are independent uniform random variables drawn 
from (— 2~™~^, 2~™~^), independent from all signals and noises in the network. Notice that, since the 

relaying functions r^^ satisfy the power constraint in Definition 1, so will the new relaying functions 
rv^. In order to relate the error probability of to the error probability of Cn, we will need the 
following lemma, whose proof is in the Appendix. 

Lemma 3. Suppose Y is a random variable with density f. Let y(™) = \ Y\m + J/^™'', where t/^™) 
is uniformly distributed in (— 2~"*~^, 2~"^~^) and independent from Y. Then each y("*) has a density 
f^'^\ and /^"^^ converges pointwise almost everywhere to f. 

This lemma will be used to show that, by picking m sufficiently large, we can make the error prob- 
ability of code „ arbitrarily close to e„. Suppose we fix the message vector w G nl='i{l' 2^^^} 
and let Y be the random vector of length n\V\ corresponding to all the received signals at all nodes 
during the n time steps in the block if code C„ is used. More precisely, we write Y = (Y[l], Y[n]), 
where Y[t] = {Yi[t], y|y|[i]) is the random vector of received signals at all \V\ nodes at time t, 
for 1 < t < n. The received signal at node v at time t, is defined in (2). Notice that here 

we assume that the set of nodes V can be written as V = {1, in order to simplify some ex- 
pressions. We claim that the random vector Y conditioned on the choice of messages W = w has a 
density. To see this, we notice that, conditioned on the received signals received up to time t — 1, i.e., 
on (Y[l], Y[t — 1]) = (y[l], ■■■,y[t — 1]), and on W = w, the transmit signals at time t, Xy[t] for 
f G y, are all deterministic. Thus, the received signals Yu[t\, for v £ V, are conditionally independent 
and each one is normally-distributed, conditioned on (Y[l], Y[t — 1]) = (y[l], y[t — 1]) and 
W = w. Therefore, the conditional pdf /y„[t]|Y[i],...,Y[t-i],w(y^^ W|y[l]) ■••) ~^]:^) exists for each 

V eV. We conclude that, conditioned on W = w, the random vector Y has a density given by 

1^1 n \V\ 

/Y|w(y|w) = Yl /y„[i]|w (yt-[i]|w) JJ ]J fY4t]\Y[i],...,Y[t-i],w iyv[t]\ y[i], ..■,y[t - i], w) . 

v=l t=2 v=l 

(15) 

Similarly, we let Y^"*) be the vector of n\V\ effective received signals (14) if code is used in- 
stead, i.e., Y^ = (YM[l],...,YM[n]), where Y[t] = (Yl'^^t], ..^Y^^^^t]^. Then, when we 
condition on (YM[1], YM[t - 1]) = (y[l], y[t - 1]), and on W = w, the transmit sig- 
nals at time t, [t] for v £ V, are all deterministic, and the effective received signals Y^"^^ [t] , for 

V £ V, are conditionally independent (although not normally-distributed). To see that the conditional 
pdf fYy[t]\Y[i],...,Y[t-i],wiyv[t]\y[l],:.,y[t - l],w) exists, we notice that, from (14), Y^'^t] is the 
sum of two independent random variables (even when conditioned on (Y*^™)[1], ...,Y^"^^[t — 1]) = 
(y [1], ...,y[t — 1]) and W = w), and, since c4"*^ [t] has a density, so does Y^"*) [t] (see page 266 in [8]). 
Therefore, we can write the conditional pdf of Y^"*) conditioned on W as 

/Y(-)|w(y|w) = n %'")[i]|w (y^'WI^) 

v=l 

n \V\ 

n n %™)MiY(™)[i],...,YW[t-i],w - ^) • (16) 

t=2 v=l 

As we mentioned before, conditioned on (Y("*)[1], Y("^)[t - 1]) = (y[l], ...,y[t - 1]) and W = w 
(or just W = w, if t = 1), \t\ is normally-distributed and thus has a density. Therefore, the random 
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variables yj""^ = [Y,[t]\m + for m = 1,2, conditioned on (Y("')[i]^ YM[i - 1]) 



j-{m) 



(y[l], y[t — 1]) and W = w, satisfy the conditions of Lemma 3, and we have that 

(y„[l]|w) ^ /y„[i]|w (y^[l]|w) and 



/n[t]|Y[i],...,Y[t-i],w (yt-MI y[i], •••,y[i - 1], w) , 



as m — oo, for t = 2, ...,n and v £ V, for almost all y G R^'^L Therefore, we conclude that 

/y('^)|W (yl"^) /Y|w(y|w) as m ^ oo for ahnost all y G M"l^l and any w G nl=i{l> -, 2^^'}- 

Next we notice that, conditioned on the message vector W = w, whether we make an error or not 
is a function of the received signals at all nodes during the n time steps (it is actually only a function of 
the received signals at the destinations, but that is irrelevant). Thus, there exists a set E-^ C R"l^l of 
received signals during the n time steps which cause a decoding error (at any of the decoders). We will 
let jj,^^ be the probability measure on R"l^l corresponding to Y (the received signals when using coding 
scheme Cn) conditioned on W = w and /x^'"^ be the probabiUty measure on M"l^l corresponding to 
Y("^) (the effective received signals when we use coding scheme conditioned on W = w. By 

Scheffe's Theorem [8], we have that 



sup 



(n) 



(A) - fi^:r''\A) < [ /Y|w(y|w)-/YW|w(y|w: 



ciA — > 0, as m — >■ oo, 



where B is the Borel cr-field on M"l^l, and A is the Lebesgue measure. This, in turn, implies that for any 



choice of messages w, we must have lim^_^oo l^^'^\Ey,) = iJL^'{Eyf). We conclude that 



y(™) G 



W 



w 



{m,n) 



.in) 



(17) 
(18) 



Therefore, we can choose, for each n, nin sufficiently large such that the probability of error of code 
^m„ n' ^m„,n, IS at most 2e„. Finally, we need to take care of the fact that C*^^^ uses randomized 
relaying and decoding functions. First, we notice that if we let be the random vector corresponding 
to the n\V\ samples from 2~("*+^), 2~("*+^)) used by the \V\ nodes during n time steps, then we 
can write 



= 2~"^i=i ^ Pr Y^"*"^ G E. 
w 

2-"E1=i Ri ^ Pr I^Y^""") G E. 



W = w 



E 



W = w, 



Therefore, there must exist some u G M"l^l for which 



Y("^n) e E^ 



W = w,U, 



u 



Thus, we define C* to be the coding scheme by having each node v at time t quantize its received 
signal with resolution m„, add to it Uv[t] (i.e., the entry of u corresponding to node v and time t) 
and then apply the relaying/decoding function from code Cn- It is then clear that C* has deterministic 
relaying/decoding functions, and its error proability is at most em„,n < 2e„. Therefore, the sequence of 
codes C*, n = 1,2, has finite reading precision and achieves the rate tuple R. 
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4 Extension to General Traffic Demands 



One immediate extension of the result in Theorem 1 is to consider wireless networks with general traffic 
demands. These could include non-unicast flows such as multicast and broadcast flows. We again 

consider an additive noise wireless network described by a directed graph G = {V,E). This time, 
however, each node w G y is a source and has a message w{v, D) for each set of destinations D G ViV), 
where 'P{V) is the power set of V. We can now replace Definition 1 with the following. 

Definition 6. A coding scheme C with block length n G N and rate tuple R G M^^^(^)/or an additive 
noise wireless network consists of: 

1. Encoding/relaying functions r^^ : R*^^ x Y[D(^'P(V)i^^ •■•■>'^"'^^^'^^} ^ ^, for t = l,...,n,for 
each node v & V, satisfying the average power constraint 



1 " r i2 
-Y,[4'\yi,...,yt-i,^v)\ <P, 



for all (yi, G M*-i andw^ G nDeP(y){l' S"-^^^'''''^}- 

2. A decoding function gu -^"^ ^Yl vev, {1; 2"'^(^'^)}/or each node u gV. 

DeV{vy.ueD 

With this definition of a coding scheme, it is straightforward to extend Definitions 3, 4 and 5. Then 
we generalize Theorem 1 as follows. 

Theorem 6. Suppose a rate tuple R is achievable on an AWGN wireless network G. Then it is possible 
to construct a sequence of coding schemes that achieves arbitrarily close to R on the same additive 

noise wireless network G where, for each relay v, the distribution of is replaced with an arbitrary 
absolutely continuous distribution satisfying E[Nv] = and E [A^^] = a^. 

Theorem 6 can be proved using the same steps in the proof of Theorem 1 . To re-prove Theorem 
2 in this new setting, we start by applying the OFDM-like scheme to the transmit and received signals 
of every node exactly as done in section 3.1. Thus, the convergence in distribution of the effective 
additive noise terms to Gaussian, proved in section 3.2, still holds. Therefore, we may assume that, as 
in the beginning of section 3.3, we have k blocks of b network uses each, and we apply the OFDM-like 
scheme inside each length-6 block. Next, by interleaving the network uses, we obtain b blocks of length 
k inside which the network is approximately AWGN. Furthermore, since we start off with a sequence of 
coding schemes = {k, R) with finite reading precision, the proof of Lemma 2 holds verbatim, except 
that w, the vector of messages chosen, is now a vector in rTi-ev^DePCV)!^' •• ' 2"^(^'^)}. Thus, within 
each length-fc block, the probability that any node decodes any of its messages incorrectly (assuming 
all messages are chosen independently and uniformly at random) is given by eb,fc, where eb^k f-k as 
6 ^ oo and e/t ^ as — >■ oo. 

In order to deal with the dependence between the noise realizations of different length-A; blocks, we 
will again consider employing outer codes. This time, however, instead of having one outer code for 
each source-destination pair, we will have one outer code for each message w{v,D) (i.e., one outer code 
for each v e V and D G V{V)). Thus, for each v e V and D G V{V), we will define a broadcast 
discrete channel with input and output alphabet {1. 2^^^'^'''^)}^, where v is the source and all nodes 
in D are the destinations, which are all interested in the same message. We construct each code by 
sampling {1, 2'^^(^'^)}'' uniformly at random. Let W{v, d)^ correspond to a random symbol chosen 
by V uniformly at random from {1, 2'=^(^'^)}^, and W{v, d)\ be the corresponding output symbol at 
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each node u E D. For the outer code associated with v and D, we can achieve rate 

^ mm I (W{v, Df; W{v, D)l) = ^ mm {h{W{v, Df) - H{W{v, Df\W{v, D)l)) 

1 ^ 

> R{v, D) - max - V H{W{v, DmW{v, 

u&D OK '-^ 

1=1 

(i) 1 

> i^fu,!)) -max -(l + efefcA;i?(?;,£>)) 

ms-D k 

= i2(i;,L»)(l-e6,fe)-p 

where (z) follows from Fano's Inequality, since, within each length-A; block, we apply code and we 
have an average error probability of at most eb,fc. Therefore, by choosing h and k sufficiently large, our 
constructed code achieves arbitrarily close to R on the non-Gaussian additive noise wireless network. 

The proof of Theorem 3 holds in this new setting almost verbatim. The only difference is that we 
now have one rate for each source s e F and destination set D C F and the message vector w has size 
V X ■p(F), and the expressions for the error probability in (17) must be modified. This concludes the 
proof of Theorem 6. 

5 Concluding Remarks 

In this work, we proved that the Gaussian noise is the worst-case noise in additive noise wireless net- 
works. This extends the classical result that Gaussian noise is the worst-case noise for point-to-point 
additive noise channels, which is commonly used as a justification for the modeling of the noise in wire- 
less systems as Gaussian noise. Thus, we provide formal evidence that this modeling is indeed justified 
beyond the point-to-point setting. 

It is important to highhght the fact that we prove our result by actually constructing a coding scheme 
that performs well on a non-Gaussian network given a coding scheme designed to perform well on an 
AWGN network. This is different from the mutual-information-based proof for point-to-point chan- 
nels, which reUes on the Channel Coding Theorem, and, thus, in random coding arguments which only 
provide existence guarantees. Even though we do make use of random coding arguments when we con- 
struct the outer codes in section 3.3, this is done mostly as a way to construct coding schemes whose 
error probability can be shown to tend to zero. As mentioned in section 3.3, the outer codes must be used 
since the union bound would only provide an outer bound on the error probabiUty of the form 6eb fc, and 
we do not have any guarantees on the rate of decay of e;, ^ as h and k increase. Thus, it seems reasonable 
to think that given a sequence of coding schemes for an AWGN network whose error probabihty decays 
fast to 0, no outer code will be necessary. 

Another important point about the techniques we introduce is that the only information about the 
actual noise distributions that is required for the coding scheme construction are the mean and the vari- 
ance. This means that, given a wireless network with unknown noise distributions where only the mean 
and variance can be measured, it is possible to construct a sequence of coding schemes that achieves the 
capacity of the corresponding AWGN network. 

Finally, we notice that the result in Theorem 3 is interesting in itself, since it impUes that the capacity 
region when we restrict ourselves to coding schemes finite reading precision and allow the precision to 
tend do infinity along the sequence of coding schemes is equal to the unrestricted capacity. In fact, 
it is not difficult to change the proof of the theorem in order to prove that C^"*^ the capacity region 
when we restrict ourselves to coding schemes where only m bits after the decimal point are available, 
converges to the unrestricted capacity region C, as m — oo. Since in any practical wireless system the 
analog received signals must go through an ADC, this result essentially implies that by increasing the 
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resolution of the ADCs used in a wireless network, the capacity region of the practical system is indeed 
approaching the capacity region of the usual infinite-precision models used in the study of wireless 
networks. 
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A Appendix 

Lemma 4. Let A denote the Lebesgue measure. Then, for any convex set S, X{dS) = 0. 

Proof. Consider any point p G dS. Clearly, p ^ S°, and by the Supporting Hyperplane Theorem [9], 
there exists a hyperplane that passes through p and contains S in one of its closed half-spaces. Let H be 
such a closed half-space. Since H is closed, it is clear that dS C H. Then, for any closed ball B^{p) 
centered at p, it is clear that 

X{B,ip)ndS) X{B,{p)nH) ^ 

x{B,{p)) - mip)) ' ■ 

By Lebesgue's Density Theorem, the set 

1 e^O X{B,{p)) 

should have Lebesgue measure zero. But since P = dS, we conclude that A {dS) = 0. □ 

Lemma 3. Suppose Y is a random variable with density f. Let Y„i = \Y\„i + U,n, where Um is 
uniformly distributed in ^— 2~'"~^, 2~"^~^) and independent from Y. Then each Ym has a density fm, 
and fm converges pointwise almost everywhere to f. 

Proof Since the density of U (_2-"^-\ 2-"^-i) is g{x) = 2"'l{x G (-2-™-\ 2-'"-i)}, Y^ will 
have a density that can be written, for almost all y, as 

fmiy) =E[giy- [YU)] = 2^E - [l^J™ G (-2-™-\ 2"™-!)}] 
= 2-Pr [y- Lyj„. e (-2-™-\2— 1)] 
= 2-Pr[Lyj™G (y-2-™-i,y + 2-™-i)] 
= 2"^Pr[[2"yj G (i/2'"- 1/2, 7/2™ + 1/2)] 
= 2"^ Pr [T^Y G ( \y2'^ - 1/2] , \y2'^ + 1 /2] )] 
= 2"^Pr [Y G (2-™[y2™ - 1/2] , 2-^" [2/2"^ + 1/2])] 

= 2- / f{x)dx, (19) 

<J (Xrn 

where am = 2"™ [?/2™ - 1 /2] and bm = 2"^ \y2'^ + 1/2] • Notice that we can write 6^ = + 2""^. 
Moreover, we have that 

y _ 2-("»+i) <am<y + 2-^"*+') , (20) 
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from which we have — > y as m ^ oo. If we let F{y) be the cdf of Y, then (19) can be written as 

F{bm) - F{am) F{am + 2-^) - F{am) 



2-^ 



= qm- (21) 



Our goal is to show that qm converges to f{y) asm ^ oo for almost all y. Since by assumption Y has 
an absolutely continuous distribution, F{y) is differentiable almost everywhere, so it suffices to show 
that Qm converges to /(y) as m — > oo wherever F{y) is differentiable and the derivative is f{y). Thus, 
we focus on a y where F'{y) = f{y). Suppose by contradiction that qm does not converge to /(y). 
Then there must be an e > and a subsequence {qm,}'^i, such that one of the following 

qm^ > f{y) + e (22) 
qm, < /(y) - e (23) 

holds for all i > 1. Suppose wlog that we have a subsequence {qmi}^i for which (22) holds for all 
i > 1. We will now pick a further subsequence of {^m, in the following way. First, we choose 
K G Z+ large enough so that f{y)/K < e, and we define K subsets of {1, 2, ...} as 

Sj = !^i>l:y- 2-('"*+i) + 1^2-""* < am, < y - 2-("^*+i) + -^2-'"^| , 

for j = 1, 2, K. From (20), the sets ^i, Sk partition {1, 2, ...}, and we must be able to find some 
Sj that is infinite. Suppose l^tl = oo. Then we have a subsequence {qmi}ieSt' which we re-index as 
{^Ji^i- For each of the elements in this subsequence we have 

F{ai, + 2-^0 - F{ae,) _ F{ae, + 2"^') - F{y) ^ F{y) - F{ai,) 



^ ae,+2-^i -yF{ae,+2-^')-F{y) ^ y - g^. F{y) - F{ae,) 

2~^i a^. + 2~^i — y 2~^» y — a^. 

(J) 2-^^(1 + t/K - 1/2) F{at, + 2-^^) - F(y) ^ 2-^^(1/2 -{t- l)/K) F{y) - Fjae,) 
~ 2-^i a^. + 2-^i - y 2"^^ y - a^. 

= (t/K + i/2) -^("^-+^^-';)-^(^) + (1/2 -it- 1)/K)^M^^, (24) 

where (f) follows since F{y) is non-decreasing and ii G St. Now, notice that the right-hand side in (24) 
has a limit, and, by taking the lim sup, we obtain 

lim sup g,, < {t/K + l/2)/(y) + (1/2 - (i - l)/i^)/(y) 

= (l + ;^) /(?/)< + 

But this is a contradiction because all qm, satisfied qm^ > f{y) + e, and C {(/mili^i- We 

conclude that we must have 

lim qm = fiy), 
m— >-oo 

which impUes that fm{y) fiu) as m ^ oo. □ 
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